Gaussian Mixture Embeddings for Multiple Word Prototypes

نویسندگان

Xinchi Chen

Xipeng Qiu

Jingxiang Jiang

Xuanjing Huang

چکیده

Recently, word representation has been increasingly focused on for its excellent properties in representing the word semantics. Previous works mainly suffer from the problem of polysemy phenomenon. To address this problem, most of previous models represent words as multiple distributed vectors. However, it cannot reflect the rich relations between words by representing words as points in the embedded space. In this paper, we propose the Gaussian mixture skip-gram (GMSG) model to learn the Gaussian mixture embeddings for words based on skip-gram framework. Each word can be regarded as a gaussian mixture distribution in the embedded space, and each gaussian component represents a word sense. Since the number of senses varies from word to word, we further propose the Dynamic GMSG (D-GMSG) model by adaptively increasing the sense number of words during training. Experiments on four benchmarks show the effectiveness of our proposed model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs

Gaussian LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the embedding space. This can take semantic information of words into account. However, the Euclidean similarity used in Gaussian topics is not an optimal semantic measure for word embeddings. Acknowledgedly, the cosine similarity better...

متن کامل

Word Embeddings with Multiple Word Prototypes

The ability to accurately represent word vectors to capture syntactic and semantic similarity is central to Natural language processing. Thus, there is rising interest in vector space word embeddings and their use especially given recent methods for their fast estimation at very large scale. However almost all recent works assume a single representation for each word type, completely ignoring p...

متن کامل

DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output

We describe our system (DT Team) submitted at SemEval-2017 Task 1, Semantic Textual Similarity (STS) challenge for English (Track 5). We developed three different models with various features including similarity scores calculated using word and chunk alignments, word/sentence embeddings, and Gaussian Mixture Model (GMM). The correlation between our system’s output and the human judgments were ...

متن کامل

Multimodal Word Distributions

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic i...

متن کامل

Identity-sensitive Word Embedding through Heterogeneous Networks

Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the identitysensitive word embeddings. Based on an identity-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1511.06246 شماره

صفحات -

تاریخ انتشار 2015

Gaussian Mixture Embeddings for Multiple Word Prototypes

نویسندگان

چکیده

منابع مشابه

Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs

Word Embeddings with Multiple Word Prototypes

DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output

Multimodal Word Distributions

Identity-sensitive Word Embedding through Heterogeneous Networks

عنوان ژورنال:

اشتراک گذاری